AIbase
Home
AI Tools
AI Models
MCP
AI NEWS
EN
Model Selection
Tags
Reinforcement Learning Tuning

# Reinforcement Learning Tuning

Mimo 7B RL
MIT
MiMo-7B-RL is a reinforcement learning model trained based on the MiMo-7B-SFT model, demonstrating outstanding performance in mathematical and code reasoning tasks, comparable to OpenAI o1-mini.
Large Language Model Transformers
M
XiaomiMiMo
11.79k
252
Meta Llama 3 70B Fp8
Other
Meta Llama 3 70B is a large language model developed by Meta, featuring 70 billion parameters and supporting an 8k context length, designed for English-language business and research applications.
Large Language Model Transformers English
M
FriendliAI
34
5
Meta Llama 3 8B Instruct GGUF
This is the GGUF quantized version of the 8 billion parameter instruction-tuned model from the Meta Llama 3 series, optimized for dialogue scenarios and demonstrating excellent performance in multiple benchmark tests.
Large Language Model English
M
MaziyarPanahi
293.90k
88
Ppo LunarLander V2
This is a reinforcement learning model based on the PPO algorithm, specifically designed to solve the landing task in the LunarLander-v2 environment.
Physics Model
P
araffin
65
18
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
English简体中文繁體中文にほんご
© 2025AIbase